Famous Men and Women on Wikipedia

How do the Wikipedia pages of famous men differ from those of famous women?

Introduction

The main question that we set out to answer in this project was the following: How do the Wikipedia pages of famous women differ from those of famous men?

To answer this question, we first generated a list of the to 5 'most famous' women and men using Google's PageRank We chose to use PageRank as a metric by which to define fame based on the desire to use a source extrinsic to wikipedia to avoid biases that wikipedia metrics may have(although such biases in Google's algorithm are of course also possible). The top 10 most famous men and women based on this metric are listed below:

Rank Women Men
1 Elizabeth II Napoleon
2 Queen Victoria Barack Obama
3 Mary(mother of Jesus) George W. Bush
4 Elizabeth I William Shakespeare
5 Margaret Thatcher Jesus
6 Madonna(entertainer) Adolf Hitler
7 Hillary Clinton Franklin D. Roosevelt
8 Catherine the Great Aristotle
9 Beyonce Bill Clinton
10 Britney Spears Ronald Reagan

We were interested not in the specific content of the pages, but rather, features of these pages, and user interaction with these pages. With this in mind, we chose to investigate this question by looking at the following attributes:

  • The number of backlinks(links to the Wikipedia pages from outside pages)
  • The number of revisions per page
  • The size of revisions to these pages
  • The number of unique editors per page
  • The amount of text per page
  • The language used on these pages(main pages, as well as talk pages)

The top 5 most famous men and women on Wikipedia

We first decided to look at the Wikipedia pages of only the top 5 men and women.

One attribute we looked at was backlinks(the number of links to a given Wikipedia page from outside pages. We intended to use this feature as one metric of the page's popularity, specifically, its popularity outside of Wikipedia. Results from the top 5 pages(for men and women) show that in general(with the exception of the most popular man and woman), the pages of famous men have more backlinks.

We next looked at the number of revisions per page(over the lifetime of the page). The results show a similar trend to that of the backlinks: that the number of revisions to the Wikipedia page of a famous man is greater than the number of revisions to the Wikipedia page of a famous woman of equal 'fame'(PageRank)

We looked at the number of unique per page(over the lifetime of the page).We wanted to see if the observed discrepancy in the number of revisions could be explained by a small number of editors making multiple edits. However, when we plotted the number of unique editors, we saw the same trend as we observed: a greater number of unique editors for pages of men with a given rank compared to the pages of women with equal rank. Thus, it seems like there are simply more edits and editors for these mens' pages
Finally, we looked at the amount of text(number of words) per page. The results generally show that length of text for the Wikipedia page of a famous man is greater than the number of revisions to the Wikipedia page of a famous woman of equal 'fame'(PageRank)
The distribution of these categories among men and women is also interesting, and revealing about the types of professions that make men and women famous. For example, while the most common categories for famous men were Political(Historical)(pre-1900) (31%) and Political(Current) (post-1900) (28%), the most common category for famous women was Artists and Celebrities(Current) (post-1900) (51%).

Analysis for the top 100 most famous men and women

Intrigued by our results from the top 5 most famous men and women, we decided to look at the same features for a larger population(100 men and women) to see if the trends that we observed generalize. One reason that we wanted to repeat this analysis for a greater number of people was that we noticed that the top 5 most 'famous' men and women(according to PageRank), represented rather specific categories, specifically British royalty and politicians(Elizabeth II, Queen Victoria, Elizabeth I, Margaret Thatcher), and religious figures(Jesus, Mary), which were not representative of the full top 100 list. To determine if the results that we observed were in fact a result of differences between the pages of famous men and women on Wikipedia, or if this was biased by other specific attributes of these 10 people, we expanded our analysis to 100 men and 100 women, which represented a more diverse group. You can see the top 100 men and women broken down into broad categories below:

Number of Backlinks

The mean number of backlinks to the Wikipedia pages of famous men is significantly greater than the number of backlinks to the Wikipedia pages of famous women(p<0.001)

Distribution(plotted as a histogram)

Number of Revisions

The mean number of revisions to the Wikipedia pages of famous men is significantly greater than the number of backlinks to the Wikipedia pages of famous women(p=0.0001)

Number of Unique Editors

The mean number of unique editors to the Wikipedia pages of famous men is significantly greater than the number of editors to the Wikipedia pages of famous women(p<0.0001)

Amount of text(number of words)/ Page

The mean number of words/page for the Wikipedia pages of famous men is significantly greater than the number of words/page for the Wikipedia pages of famous women(p<0.0001)

Conclusions

Future Directions